Overview

With the rapid advancements in AI technologies in recent years, foundation models have shown impressive capabilities. However, in practical applications, they still face challenges such as insufficient accuracy, slow knowledge updates, and a lack of answer transparency. To address these issues, Retrieval-Augmented Generation (RAG) has emerged. RAG connects foundation models to external knowledge repositories, thereby enhancing the accuracy of question-answering (QA) systems and addressing the hallucination and timeliness problems associated with foundation models.

RAG can transform a basic foundation model into a domain-specific one at a low cost, speeding up the implementation of foundation models. RAG SDK, a knowledge augmentation technique developed on the Ascend platform, aims to achieve efficient retrieval-augmented generation. RAG SDK can aid users in creating QA systems tailored to specific scenarios, thereby improving the system's practicality and reliability.

RAG SDK Introduction

RAG SDK, an Ascend-developed RAG kit for large language models (LLMs), provides a suite of modular APIs, including domain-specific fine-tuning data generation, retrieval, and knowledge management for vector models. It is designed to empower developers to build upper-layer applications, intentionally excluding service-level functions such as user authentication and permissions.

RAG SDK Functions

RAG SDK provides powerful capabilities for rapidly developing QA systems on the Ascend platform. It includes features such as multi-modal document parsing and knowledge repository management, lowering the barrier for users to create LLM applications and enabling seamless integration with open-source ecosystems.

  • Quick setup: Modular APIs are available and can be utilized as needed. The preset end-to-end workflow templates allow users to quickly initiate QA services with minimal coding effort.
  • Multi-modal parsing: Multiple types of files, such as text, tables, PDFs, and images, can be parsed, providing diverse corpora for LLMs.
  • High-performance inference: The Ascend affinity models are optimized and accelerated to deliver higher throughput and shorter response times.

Intended Audience

This document is intended for:

  • Huawei technical support engineers
  • Technical support engineers from channel partners